Auto

Ch 02 - Q9 (applied)
Description
Gas mileage, horsepower, and other information for 392 vehicles.

Source
This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.
The dataset was used in the 1983 American Statistical Association Exposition.

References
This dataset is a part of the course material of the book : Introduction to Statistical Learning with R
(Ch 02 - Statistical Learning - Applied Exercises - Problem 9)

Short description of variables

  • mpg : miles per gallon
  • cylinders : Number of cylinders between 4 and 8
  • displacement : Engine displacement (cu. inches)
  • horsepower : Engine horsepower
  • weight : Vehicle weight (lbs.)
  • acceleration : Time to accelerate from 0 to 60 mph (sec.)
  • year : Model year (modulo 100)
  • origin : Origin of car (1. American, 2. European, 3. Japanese)
  • name : - Vehicle name
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

1) Load packages

In [1]:
In [2]:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2) Import Data

In [3]:
Out[3]:
True
In [4]:
(397, 9)
Out[4]:
mpg cylinders displacement horsepower weight acceleration year origin name
0 18.0 8 307.0 130 3504 12.0 70 1 chevrolet chevelle malibu
1 15.0 8 350.0 165 3693 11.5 70 1 buick skylark 320
2 18.0 8 318.0 150 3436 11.0 70 1 plymouth satellite
In [5]:
Out[5]:
0
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

3) Data preparation

In [6]:
In [7]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 397 entries, 0 to 396
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   mpg           397 non-null    float64
 1   cylinders     397 non-null    int64  
 2   displacement  397 non-null    float64
 3   horsepower    397 non-null    object 
 4   weight        397 non-null    int64  
 5   acceleration  397 non-null    float64
 6   year          397 non-null    int64  
 7   origin        397 non-null    int64  
 8   name          397 non-null    object 
dtypes: float64(3), int64(4), object(2)
memory usage: 28.0+ KB

The fact that a column containing numbers (horsepower) has been saved as dtype 'object' is a red flag, as 'object' dtype can be used to save columns with strings as well as numerals. This column will have to be further examined.

In [8]:
Out[8]:
mpg cylinders displacement horsepower weight acceleration year origin name
32 25.0 4 98.0 ? 2046 19.0 71 1 ford pinto
126 21.0 6 200.0 ? 2875 17.0 74 1 ford maverick
330 40.9 4 85.0 ? 1835 17.3 80 2 renault lecar deluxe
336 23.6 4 140.0 ? 2905 14.3 80 1 ford mustang cobra
354 34.5 4 100.0 ? 2320 15.8 81 2 renault 18i

Since the number of missing values is very small, those rows can just be deleted.

In [9]:
Out[9]:
mpg cylinders displacement horsepower weight acceleration year origin name
32 25.0 4 98.0 ? 2046 19.0 71 1 ford pinto
126 21.0 6 200.0 ? 2875 17.0 74 1 ford maverick
330 40.9 4 85.0 ? 1835 17.3 80 2 renault lecar deluxe
336 23.6 4 140.0 ? 2905 14.3 80 1 ford mustang cobra
354 34.5 4 100.0 ? 2320 15.8 81 2 renault 18i
In [10]:
Out[10]:
mpg cylinders displacement horsepower weight acceleration year origin name
32 25.0 4 98.0 ? 2046 19.0 71 1 ford pinto
126 21.0 6 200.0 ? 2875 17.0 74 1 ford maverick
330 40.9 4 85.0 ? 1835 17.3 80 2 renault lecar deluxe
336 23.6 4 140.0 ? 2905 14.3 80 1 ford mustang cobra
354 34.5 4 100.0 ? 2320 15.8 81 2 renault 18i
In [11]:
Out[11]:
(392, 9)
In [12]:
Out[12]:
dtype('int64')
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

(a) Which of the predictors are quantitative, and which are qualitative?

Quantitative → numerical values.
Qualitative → values in one of K different classes, or categories.

In [13]:
In [14]:
mpg : 127
cylinders : 5
displacement : 81
horsepower : 93
weight : 346
acceleration : 95
year : 13
origin : 3
name : 301
variable description variable type
mpg miles per gallon quantitative
cylinders Number of cylinders between 4 and 8 qualitative or categorical
displacement Engine displacement (cu. inches) quantitative
horsepower Engine horsepower quantitative
weight Vehicle weight (lbs.) quantitative
acceleration Time to accelerate from 0 to 60 mph (sec.) quantitative
year Model year (modulo 100) quantitative
origin Origin of car (1. American, 2. European, 3. Japanese) qualitative or categorical
name Vehicle name qualitative or categorical

"year" can be considered to be quantitative in the sense that it could indirectly reflect the impact of technological abilities of the times, otherwise it can be considered qualitative (categorical).

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

(b) What is the range of each quantitative predictor?

In [15]:
Out[15]:
['mpg', 'displacement', 'horsepower', 'weight', 'acceleration', 'year']
In [16]:
Out[16]:
mpg displacement horsepower weight acceleration year
min 9.0 68.0 46.0 1613.0 8.0 70.0
max 46.6 455.0 230.0 5140.0 24.8 82.0
range 37.6 387.0 184.0 3527.0 16.8 12.0
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

(c) What is the mean and standard deviation of each quantitative predictor?

In [17]:
Out[17]:
mpg displacement horsepower weight acceleration year
min 9.00 68.00 46.00 1613.00 8.00 70.00
max 46.60 455.00 230.00 5140.00 24.80 82.00
range 37.60 387.00 184.00 3527.00 16.80 12.00
mean 23.45 194.41 104.47 2977.58 15.54 75.98
std 7.81 104.64 38.49 849.40 2.76 3.68
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

(d) Range, mean and standard deviation after removing observations 10-85

In [18]:
Out[18]:
(316, 6)
In [19]:
Out[19]:
True
In [20]:
Out[20]:
mpg displacement horsepower weight acceleration year
min 11.000 68.000 46.000 1649.000 8.500 70.000
max 46.600 455.000 230.000 4997.000 24.800 82.000
range 35.600 387.000 184.000 3348.000 16.300 12.000
mean 24.404 187.241 100.722 2935.972 15.727 77.146
std 7.867 99.678 35.709 811.300 2.694 3.106
In [21]:
Out[21]:
mpg               7711.8
displacement     59168.0
horsepower       31828.0
weight          927767.0
acceleration      4969.7
year             24378.0
dtype: float64
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

(e) Graphical examination of predictors

In [22]:
Observations:
- acceleration appears to be roughly normally distributed.
- Increase in no. of cylinders leads to
 • lower : mpg, acceleration
 • higher : displacement, horsepower, weight
- There has been a decline in the no. of new models coming out with 8 cylinders.
- Newer models are lighter and and have loss horsepower (presumably because of decreased weight).
- mpg appears to have strong (non-linear) relationships with displacement, horsepower, weight and is negatively correlated with the 3 variables.
- Fuel efficiency of new models has imporoved over the years. Minimum mpg has almost doubled in the span of 12 years.
 Mazda GLC, of Japenese origin, is the car with the highest mpg, 44.6, and came out in 1980.
- displacement, horsepower and weight appear to have a strong positive correlation with each other.
- A moderate negative correlation may exist between horsepower and acceleration.
In [23]:
In [24]:
707274767880821015202530354045
origin132Scatter Plotyearmpg

origin : 1 - American, 2 - European, 3 - Japanese

In [25]:
Observations:
- Cars of European (2) and Japenese (3) origin can be seen to be overlapping in every criteria in the contour plots whereas American cars (1) have a larger and distict spread.
- Clear distinctions can be seen between American and the other 2 carmakers in displacement, horsepower and weight.
'################ workings
In [27]:
Out[27]:
3      4
4    199
5      3
6     83
8    103
Name: cylinders, dtype: int64
In [28]:
In [29]:
Out[29]:
cylinders 3 4 5 6 8
year
70 0 7 0 4 18
71 0 12 0 8 7
72 1 14 0 0 13
73 1 11 0 8 20
74 0 15 0 6 5
75 0 12 0 12 6
76 0 15 0 10 9
77 1 14 0 5 8
78 0 17 1 12 6
79 0 12 1 6 10
80 1 23 1 2 0
81 0 20 0 7 1
82 0 27 0 3 0
'################ workings'
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

(f) Variables useful in predicting mpg

Except for acceleration, all the varibles display some sort of relationship or trend with mpg, whether positive or negative.
Positive : year
Negative : cylinders, displacement, horsepower, weight
Non-directional : origin
They can be taken into account for predicting mpg, after adjusting for collinearity.

1193×1131